Large Margin Taxonomy Embedding with an Application to Document Categorization

نویسندگان

  • Kilian Weinberger
  • Olivier Chapelle
چکیده

Applications of multi-class classification, such as document categorization, often appear in cost-sensitive settings. Recent work has significantly improved the state of the art by moving beyond “flat” classification through incorporation of class hierarchies [4]. We present a novel algorithm that goes beyond hierarchical classification and estimates the latent semantic space that underlies the class hierarchy. In this space, each class is represented by a prototype and classification is done with the simple nearest neighbor rule. The optimization of the semantic space incorporates large margin constraints that ensure that for each instance the correct class prototype is closer than any other. We show that our optimization is convex and can be solved efficiently for large data sets. Experiments on the OHSUMED medical journal data base yield state-of-the-art results on topic categorization.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

Large margin multinomial mixture model for text categorization

In this paper, we present a novel discriminative training method for multinomial mixture models (MMM) in text categorization based on the principle of large margin. Under some approximation and relaxation conditions, large margin estimation (LME) of MMMs can be formulated as linear programming (LP) problems, which can be efficiently and reliably solved by many general optimization tools even fo...

متن کامل

Search Query Categorization at Scale

State of the art query categorization methods usually exploit web search services to retrieve the best matching web documents and map them to a given taxonomy of categories. This is effective but impractical when one does not own a web corpus and has to use a 3 party web search engine API. The problem lies in performance and in financial costs. In this paper, we present a novel, fast and scalab...

متن کامل

Steganalysis of embedding in difference of image pixel pairs by neural network

In this paper a steganalysis method is proposed for pixel value differencing method. This steganographic method, which has been immune against conventional attacks, performs the embedding in the difference of the values of pixel pairs. Therefore, the histogram of the differences of an embedded image is di_erent as compared with a cover image. A number of characteristics are identified in the di...

متن کامل

Hierarchical Multi-Label Text Categorization with Global Margin Maximization

Text categorization is a crucial and wellproven method for organizing the collection of large scale documents. In this paper, we propose a hierarchical multi-class text categorization method with global margin maximization. We not only maximize the margins among leaf categories, but also maximize the margins among their ancestors. Experiments show that the performance of our algorithm is compet...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008